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Abstract 

An orthogonal discrete auditory transform (ODAT) from sound 
signal to spectrum is constructed by combining the auditory spreading 
matrix of Schroeder et al and the time one map of a discrete nonlocal 
Schrodinger equation. Thanks to the dispersive smoothing property of 
the Schrodinger evolution, ODAT spectrum is smoother than that of 
the discrete Fourier transform (DFT) consistent with human audition. 
ODAT and DFT are compared in signal denoising tests with spectral 
thresholding method. The signals are noisy speech segments. ODAT 
outperforms DFT in signal to noise ratio (SNR) when the noise level 
is relatively high. 
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1 Introduction 



Acoustic signal processing can benefit significantly from utilizing properties 
of human audition, e.g. perceptual coding in MP3 technology of music com- 
pression [T2"| IT3*] . In ^H] j an invertible discrete auditory transform (DAT) 
is formulated by the present authors to map sound signal to auditory spec- 
trum. DAT is more adapted to the spectral features of the ear than Fourier 
transform. It incorporates the auditory spreading functions of Schroeder, 
Atal and Hall to achieve smoother spectrum than that of the discrete 
Fourier transform (DFT), and better performance in denoising under spec- 
tral thresholding. However, such a transform has redundancy in the sense 
that the image of a discrete vector lies in a higher dimensional space, similar 
to tight frames in wavelets [3 E] • 

In this paper, such redundancy is removed by constructing an orthogonal 
(unitary) matrix with spreading property over frequency bands comparable 
to the critical bands in hearing. Critical bands (Table 10.1, p 309, [T2] ) 
characterize the bandwidth of the human auditory filter. The auditory or- 
thogonal matrix is obtained from the time one map of a spatially discrete 
nonlocal Schrodinger equation. The Schrodinger equation conserves the L 2 
norm or Euclidean length, implying the orthogonality of the time one map. 
On the other hand, the dispersive smoothing nature of the Schrodinger evo- 
lution leads to the spreading property of the time one map. The auditory 
functions of Schroeder, Atal and Hall [13 appear as a nonlocal potential in 
the Schrodinger equation. As a result, a class of orthogonal discrete auditory 
transforms (ODAT) are generated. In searching for ODATs, an alternative 
method based on the dilation equation of wavelets is also found, however, 
such an approach turns out to be too rigid to accomodate auditory proper- 
ties, e.g. spectral spreading across critical bands. 

The paper is organized as follows. In section 2, the ODAT is derived from 
the general DAT ^3] , and the ODAT construction is presented based on the 
discrete Schrodinger equation. A specific ODAT is given by inserting the 
auditory spreading functions in [T3|. In section 3, auditory spectra of a two 
tone signal (with frequencies across a critical band) and of a vowel segment 
are compared with their DFT counterparts to illustrate the auditory spectral 
spreading. Denoising with spectral thresholding is performed on voiced and 
unvoiced speech segments. ODAT is found to increase signal to noise ratio 
beyond DFT when the noise content is relatively high. Concluding remarks 
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are made in section 4. 



2 ODAT and Schrodinger 

Let s = (s , ■ ■ ■ , sjy-i) be a discrete real signal, the discrete Fourier transform 
(DFT) is [I]: 

N-l 

s k = J2 s " e ~ l(2nnk/N) - (2- 1 ) 

n=0 

The general discrete auditory transform (DAT) is jT3] : 

N-l 

Sj,m = ^ $1 Kj—l,m-, (2-2) 
1=0 

where the double indexed kernel function is: 

N-l 

Ki,m = ^ ^ X m ^ n e ^ ^ ^; (2-3) 

n=0 

where the matrix X m n has square sum equal to one in m: 

M-l 

\X m ,n\ 2 = 1, Vn. (2.4) 

m=0 

Here M is on the order of N. 

DFT is recovered from DAT by setting j = 0, M = N, and X m ^ n the 
N x N identity matrix. In case that X m ^ n is a nontrivial orthogonal matrix, 
let us still set j = in (Q to find: 

N-l N-l 

a = e - V s, V X p -i(?*ln/N) 

J 0,m — — / *Z / ^m,?! e 

Z=0 n=0 
AT-1 

= ^ ^ A mn s n . (2-5) 

n=0 

The mapping from to 5* m is orthogonal. The problem reduces to finding 
an orthogonal matrix (X m n ) with auditory features. 
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Such a matrix acts on complex numbers s n (except the modes n = and 
n = N/2, so called DC and Nyquist modes). Let us consider the time one 
map of the following spatially discrete Schrodinger equation: 

N h 

i U n7t = CTi (u n+ i -2u n + W„_i) +(J 2 ^ Vm tn U m , (2.6) 

m=l 

where o\ and a 2 are positive real numbers, Nh = N/2 — 1, (V mtn ) is a sym- 
metric Nh x Nh matrix to carry certain auditory information of the ear. 
For simplicity, Dirichlet boundary condition is imposed for the evolution of 
equation (J2.6)) . The discrete equations (|2.6|) can be cast in the matrix form: 

iU t = (a 1 A + o 2 B) U, (2.7) 

where U = {u\, u 2 , ■ ■ ■ , UN h ) T , A the tridiagonal matrix (—2 on the diagonal, 
1 on the two off-diagonals), B the real symmetric matrix with entry V m>n at 
(m,n). The time one map of (|2.7|) . denoted by T w , is simply exp{i(cri A + 
o"2 B)} which is clearly orthogonal, T W T W = Id^ h , where the prime denotes 
the conjugate transpose. 

The matrix B is built from auditory spreading functions ^2] denoted by 
S(b(f m ), b(f n )), where f m is the frequency to spread from, /„ is the frequency 
to spread to, and b is the standard mapping from Hertz (Hz) to Bark scale 
[7j. The functional form of S(-,-) is given in [T^]. Define V m ,n = 1/2 • 
(S(b(f m ), b(f n )) + S(b(f n ), b(f m ))), so B is the symmetric part of the matrix 
(S(b(f m ), b(f n )). Numerical results based on this choice of B will be reported 
in the next section. 

The matrix X = (X m n ) takes the block diagonal form: 

X = dmg{l,T w ,lX*}, (2.8) 

where the tilde denotes the reverse permutation of columns of T w so that the 
spreading occurs symmetrically on the DFT components (sj, Nh + 2 < I < 
N — 1) to preserve the conjugate symmetry of the spectrum. The matrix X 
is clearly orthogonal and leaves invariant the DC and Nyquist modes. The 
ODAT matrix is the product of X and DFT matrix. 

The continuum version of (J2.6j) is: 

iu t = A x u + V(x) *u, xeR n , n>l, (2.9) 
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where * is convolution, V(x) is real and even. The L 2 norm of u is conserved 
in time. Schrodinger equations analogous to (|2.9|) have been much stud- 
ied regarding smoothing (scattering) properties and derivation from particle 
dynamics, IH1 El E] among others. When the convolution term is cubi- 
cally nonlinear in u, the equation is known as Schrodinger-Hartree [21 El Ell- 
in (HI El > the smoothing and spreading property is measured in the weighted 
norm ||^|| m , a = ||(1+ |x| 2 ) s/2 (l - A) m / 2 ^|| 2 , A the spatial Laplacian. Solu- 
tions at time t ^ satisfy the bound: 

||t*(*)||x,-i < C(||w(0)|| ,x) (1*1 H- 1*! -1 )- (2.10) 

We shall see in the next section that the time one Schrodinger map T w inherits 
the smoothing and spreading property of the continuum case. 

3 Numerical Tests 

The computation is carried out in Matlab, with ODAT parameters {a\, a^) = 
(0.6,0.04). Discrete signal (frame) length N = 256. First consider a two 
tone signal consisting of sinusoids of frequencies 3 kHz (kilo-Hertz) and 4.3 
kHz with identical amplitudes. The two frequency values span a critical 
band. Figure 1 compares the ODAT (dashed) and DFT (solid) log-magnitude 
spectra. The ODAT spectral peak regions are lower and wider than DFT's. 
Also there is more spreading in ODAT spectrum towards higher frequency, 
consistent with upward masking property of human ear JH]- This can be 
explained by the weighted norm estimate ([2.10)1 . where large x corresponds to 
large frequency. Figure 2 shows ODAT and DFT spectra of a vowel segment 
containing multiple harmonics, spectral smoothing is observed again. 

ODAT and DFT were used to denoise speech signals via the thresholding 
method in the transformed domain [T^]. The aim is to improve the signal- 
to-noise ratio (SNR) of noisy speech. The premise of the method is that 
low level components in the transformed domain are more likely to be noise 
than signal plus noise. So thresholding could improve the overall SNR of 
the signal. The simple thresholding method serves to illustrate the difference 
between ODAT and DFT in signal processing. A vowel and a consonant 
speech segments were selected, each segment has 512 data points. Noisy 
speech was created by adding Gaussian noise to the selected segments. The 
level of noise was set to produce the SNR ranging from -12 decible (dB) 
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to +12 dB with a 3 dB step size. ODAT and DFT were applied to the 
noisy speech signals. The magnitude of transformed components were then 
compared to a threshold. All components with magnitude smaller than the 
threshold were ignored for the reconstruction of the signal. The threshold 
was computed as the average of the DFT magnitude spectrum. Signal was 
reconstructed by the inverse ODAT and DFT, respectively. The SNRs of the 
reconstructed vowel signal is plotted vs. input SNRs in Figure 3. The SNRs 
of the reconstructed consonant signal is plotted vs. input SNRs in Figure 4. 
We see that ODAT (solid) improves over DFT (dashdot) in terms of SNR 
when the noise level is relatively high, particularly in case of consonants 
which resemble noise more than the vowels. 

The noise-reduction advantage can be attributed to the spectral spreading 
property of ODAT. The redundant DAT ^5] is quite similar in this respect. 
Redundancy however renders more modes in the transformed domain, and 
was observed to provide more SNR gain in denoising tests. It is interesting 
to find out how to enhance the amount of smoothing for ODAT in future 
work. 

It is rewarding to investigate how well a nonlinear nonlocal Schrodinger 
equation can model the ear's nonlinear responses. Ear's nonlinearities are 
nonlinear and nonlocal in nature and the physiological models are dispersive 
nonlinear nonlocal [5J IH E3 ED3 UZj • 

4 Concluding Remarks 

Orthogonal discrete auditory transforms (ODAT) are introduced based on 
nonlocal spatially discrete Schrodinger equations. Dispersive smoothing, 
mass conservation, and robustness of the Schrodinger equation allows one to 
inject auditory knowledge in the transform while preserving orthogonality. 
Numerical tests on two tone and speech segments demonstrate the spectral 
spreading property of ODAT and advantage in denoising. Future work will 
explore efficient ways to enhance spectral spreading for ODATs and more 
complex signal processing applications. 
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Figure 1: Comparison of ODAT spectrum (solid) and DFT spectrum (dash) 
of a two tone signal of frequencies (3,4.3) kHz and identical amplitudes. 
ODAT's spectral spreading appears near the peak areas and towards the 
higher frequencies. ODAT parameters (0-1,0-2) = (0.6,0.04). 
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Figure 2: Comparison of ODAT spectrum (solid) and DFT spectrum (d 
of a vowel segment. ODAT parameters (o~ 1 ,tj 2 ) = (0.6,0.04). 
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Figure 3: Comparison of ODAT (solid) and DFT (dashdot) denoising by 
spectral thresholding for a vowel segment. Spectral spreading property of 
ODAT helps to increase signal content when noise level is relatively high, 
e.g. input SNR below 7 decible (dB). 
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Figure 4: Comparison of ODAT (solid) and DFT (dashdot) denoising by 
spectral thresholding for a vowel segment. Spectral spreading property of 
ODAT helps to increase signal content when noise level is relatively high, 
e.g. input SNR below zero decible (dB). 
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